Global disease surveillance platform, and corresponding system and method

ABSTRACT

A computer-implemented method for identifying and assessing public health events, and a corresponding system and apparatus, includes capturing public health-related information from structured and unstructured sources, where the information is contained in one or more documents, extracting meta-data from the captured public health-related information, creating an index of the extracted meta-data; archiving the meta-data and the documents, where the index links meta-data to its associated document, processing the extracted meta-data according to one or more detection algorithms to determine if an anomaly exists, and where an anomaly exists, providing a public health event notification, and monitoring and evaluating the responses to the public health events.

CROSS-REFERENCE TO RELATED APPLICATION(S)

This application is divisional application of U.S. patent applicationSer. No. 12/309,637, filed Jan. 26, 2009, entitled “GLOBAL DISEASESURVEILLANCE PLATFORM, AND CORRESPONDING SYSTEM AND METHOD,” whichclaims the benefit of PCT Application No. PCT/US2006/036758, filed Sep.21, 2006, entitled “GLOBAL DISEASE SURVEILLANCE PLATFORM, ANDCORRESPONDING SYSTEM AND METHOD” and U.S. Provisional Application No.60/832,954, filed Jul. 25, 2006, entitled “GLOBAL DISEASE SURVEILLANCEPLATFORM, AND CORRESPONDING SYSTEM AND METHOD,” all of which are hereinincorporated by reference in their entirety.

TECHNICAL FIELD

The technical field is medical and public health warning and responsesystems.

BACKGROUND

National, state, and local governments are responsible for safeguardingthe health and safety of their citizens. Today, that responsibilitymeans coping with unprecedented public health challenges, from bothnatural causes, such as the avian flu, and from deliberate attacks, suchas bio-terrorism. To meet these challenges requires unprecedented levelsof cooperation in and among agencies and organizations charged withprotecting the safety of communities. Many of these organizations useeither proprietary or incompatible technology infrastructures that needto be integrated in order to provide real-time, critical information foreffective event monitoring, early event detection, and coordinatedemergency response. Information must be shared instantaneously and amongnumerous entities to effectively identify and respond to a potentialthreat or emergency-related event.

Significant efforts are underway along these lines, for example, in thepublic health and bio-terrorism arena. The Centers for Disease Controland Prevention (CDC) of the U.S. Department of Health and Human Serviceshas launched several initiatives aimed at forming nationwide networks ofshared health-related information that, when fully implemented, willfacilitate the rapid identification of, and response to, health andbio-terrorism threats. The CDC plans the Health Alert Network (HAN), forexample, to provide infrastructure that supports distribution of healthalerts, disease surveillance, and laboratory reporting. The PublicHealth Information Network (PHIN) is another CDC initiative that willprovide detailed specifications for the acquisition, management,analysis and dissemination of health-related information, building uponthe HAN and other CDC initiatives, such as the National ElectronicDisease Surveillance System (NEDSS). Other U.S. government agencies, andinternational agencies, including the U.S. Food and Drug Administration(FDA), the U.S. Environmental Protection Agency (USEPA), the WorldHealth Organization (WHO), and local affiliates of these organizations(e.g., state environmental protection agencies) are also involved inmonitoring the outbreak of infectious diseases, or other medicalproblems, and limiting the spread thereof. These agencies have in placea number of other initiatives, including a Nationwide Health InformationNetwork (NHIN), which will allow consumers to directly manage theirpersonal patient information, with each consumer being able to accessand review their information online through a personal data accessportal while healthcare professionals utilize a separate and distinctportal. Another initiative is the Real-time Outbreak and DiseaseSurveillance (RODS) system, which is an open source, computer-basedpublic health surveillance system for early detection of diseaseoutbreaks. The RODS system is deployed in more than 18 states, Canada,and Taiwan, and was used during the 2002 Winter Olympics. Hospitals sendRODS data from clinical encounters over virtual private networks andleased lines using the Health Level 7 (HL7) message protocol. The dataare sent in real time. The RODS system automatically classifies acomplaint from a hospital visit into one of seven syndrome categoriesusing specific classifiers. The RODS system also has a Web-based userinterface that supports temporal and spatial analyses. The RODS systemprocesses sales of over-the-counter healthcare products, but receivessuch data in a batch mode on a daily basis. The RODS system has been andcontinues to be a resource for implementing, evaluating, and applyingnew methods of public health surveillance. Still other initiatives are;the Laboratory Response Network (LRN), the FDA's Food Safety Network(eLEXNET); the U.S. Department of Agriculture's FoodNet; the U.S. EPA'sNational Environmental Public Health Network (NEIEN); and the WHO'sGlobal Outbreak and Alert Response Network.

These initiatives define functional requirements and set standards forinteroperability of the information technology (IT) systems thathospitals, laboratories, government agencies and others will use informing nationwide health networks; however, the initiatives do notsolve the problems that exist due to the disparate nature of the dataused in the initiatives, the differences between the agencies, and theoften opposing needs for both security and quick access to data. Forexample, a single enterprise, such as a hospital, may have severalseparate database systems to track medical records, patient biographicaldata, hospital bed utilization, and vendors. The same is true of thegovernment agencies charged with monitoring local, state and nationalhealth. In each enterprise, different data processing systems might havebeen added at different times throughout the history of the enterpriseand, therefore, represent differing generations of computer technology.Integration of these systems at the enterprise level is difficultenough; integration on a national or global level is much moredifficult. This lack of easy integration is a major impediment tosurveillance, monitoring, identification and early detection, real-timeevent processing, and response planning and evaluation in the publichealth and bio-terrorism arenas.

SUMMARY

What is disclosed is a method, implemented on a suitably programmedcomputing device, for identifying and assessing public health events,comprising capturing structured and unstructured public health-relatedinformation, wherein the information is contained in one or moreinformation sources; extracting meta-data from the captured publichealth-related information; and creating an index of the extractedmeta-data; and archiving the meta-data and the sources, wherein theindex links meta-data to its associated source.

Also disclosed is a global disease surveillance platform, comprising aplatform processor, wherein potential public health events aredetermined and analyzed, and wherein responses to the public healthevents are monitored; an interface coupled to the platform processor,wherein the interface receives external feeds comprising structured andunstructured data, and wherein meta-data are extracted from thestructured and unstructured data, indexed, and related back to thestructured and unstructured data; an external services module thatprovides geo-spatial services; and a storage device, wherein meta-datafrom the structured and unstructured data, and the structured andunstructured data are stored.

Still further, what is disclosed is an apparatus for managing phases ofa public health event, the apparatus including one or more suitablyprogrammed computing devices, the apparatus comprising an interface thatreceives structured and unstructured data from one or more external datasources, the interface, comprising a data transformation module thattransforms data from the structured and unstructured data sources into aschema consistent with that of the apparatus, and a data classificationmodule that that extracts meta-data related to the structured andunstructured data and creates an index of the meta-data back to the metadata's structured or unstructured data; a data store coupled to theinterface, wherein the indexed meta-data and the structured andunstructured data are stored; a processing component coupled to theinterface, comprising analysis algorithms, the analysis algorithmsapplied to the meta-data, an alert module, wherein when a threshold, asindicated by application of the algorithms to the meta-data is exceeded,a public health alert is sounded, and access modules that operate toallow real-time access to the structured and unstructured data, and tothe corresponding meta-data, wherein a response to the public healthevent is managed from pre-planning, detection, and response.

Yet further, what is disclosed is a method for managing a response to apublic health event during an entire life cycle of the event, the methodexecuted on one or more networked computers, the method comprisingreceiving information contained in one or more structured andunstructured data sources; initially processing the information,comprising extracting meta-data from the data sources, wherein themeta-data are linked to their corresponding data source, transformingthe extracted meta-data, classifying the transformed meta-data, andstoring the indexed meta-data and their corresponding data source,wherein the index allows retrieval of the corresponding data source;analyzing the meta-data to determine if a threshold value indicative ofa public health event has been exceeded, wherein if the threshold hasbeen exceeded, providing an initial public health event alert, andcontinuing to collect, process, and analyze information to allowmanagement of the response.

DESCRIPTION OF THE DRAWINGS

The detailed description will refer to the following drawings in whichlike numbers refer to like item, and in which:

FIG. 1 illustrates governmental and non-governmental agencies and theirprograms that a global disease surveillance platform (GDSP™) monitors toidentify and detect public health problems;

FIG. 2 illustrates an environment in which the GDSP™ operates, andillustrates major components of the GDSP™;

FIGS. 3A-3C are architectural diagrams of the GDSP™;

FIG. 4 is a conceptual model of the GDSP™ functions;

FIGS. 5A-5E are flowcharts illustrating exemplary GDSP™ processes;

FIG. 6 illustrates various functions of the GDSP™ during a public healthevent;

FIG. 7 illustrates a sample alert feed used with the GDSP™;

FIGS. 8-20 illustrate Web pages associated with implementation andoperation of the GDSP™; and

FIG. 21 illustrates a computing network for implementing the GDSP™.

DETAILED DESCRIPTION

In the public health arena, early event detection and rapid response todisease outbreaks, and bio-terrorism, for example, may hinge on theability to quickly and easily access disparate sources ofepidemiological information, including the ability to exploitnon-structured data sources such as Internet free text (e.g., email,blogs). This informational access ensures electronic reporting ofclinical syndromes from all possible sources, timely notification of alldisease outbreaks of urgent local, national, or internationalimportance, support for outbreak response management, and sufficientinput to compatible detection, analysis, visualization, and decisionsupport tools so as to enable prompt situation assessments. Accordingly,a global disease surveillance platform (GDSP™), and a correspondingsystem and a method for implementing the GDSP™, are disclosed. The GDSP™can be used to perform powerful multi-lingual disease and outbreaksearching across multiple sources; mine disease-related data sourcesusing data and text mining tools; and model and monitor diseases andoutbreaks using statistical modeling, On Line Analytical Processing(OLAP), visualization, and mapping tools. Access to the GDSP™ may bemade by public health officials, and in some aspects, members of thegeneral public through the Internet.

The GDSP™ provides a common set of tools, approaches and data that canbe shared at the local, state, federal and international level (see FIG.1), with the goal of improving the response of the public healthcommunity in the area of disease outbreaks, national calamities andpandemics. The GDSP™ aggregates and consumes structured and unstructuredhealth-related data and provides the following high-level functionality:

Data harvesting: The GDSP™ includes components required to extract datafrom structured and unstructured data sources.

Classification: The GDSP™ provides capabilities to classify informationusing categories of events.

Fusion: The GDSP™ provides unique capabilities to merge structured andunstructured data and linking, categorizing and ranking information.

Search and filtering: The GDSP™ provides capabilities for users tosearch, mine and filter data.

Alert notifications: The GDSP™ provides an early warning mechanism basedon user-defined thresholds.

Output: The GDSP™ provides support for reporting, visualization,temporal analysis and data export.

Restricted access: The GDSP™ provides secure communications access topartners using the GDSP™

Public access: The GDSP™ provides anonymous access to non-sensitiveinformation.

Response planning and monitoring: The GDSP™ provides users and publichealth officials with the tools to plan for potential public healthevents and to manage the response to an event throughout the event'slife cycle.

The GDSP™:

-   -   Enables multidisciplinary collaboration among global, national,        state and local public health agencies, community hospitals,        academic health centers, community healthcare providers,        laboratories, professional societies, medical examiners,        emergency response units, safety and medical equipment        manufacturers, the media, government officials, and federal        agencies such as the U.S. Office of the Assistant Secretary for        Public Health Emergency Preparedness, CDC, and the Agency for        Toxic Substance and Disease Registry (ATSDR).    -   Identifies, based on specific criteria, emerging and re-emerging        public health events, allows close monitoring of unexplained        morbidity and mortality due to public health events, such as        infectious diseases, and provides for better surveillance for        flu-like illness.    -   Establishes communication linkages with laboratory response        networks for a rapid evaluation and identification of public        health event agents such as bio-terrorism agents.    -   Allows the medical community to collaboratively share, develop,        and activate diagnostic clinical and treatment protocols, which        are communicated to the medical community and which improve        rapid and early detection and reporting of suspect cases,        unusual clusters of disease, and unusual manifestations of        disease.    -   Provides for public health planning for and response, where        necessary, to reduce the morbidity from the public health event        by viewing the status of stockpile of antibiotics, communicating        and preparing multilingual patient information, collaboratively        developing contingency plans for quarantine, and collaboratively        developing and communicating community plans for the delivery of        medical care to large numbers of patients and to the “worried        well.”    -   Use and expansion of access to health alert networks.    -   Collaboratively developed contingency plans, with local medical        examiners, for mass mortuary services, including plans for the        utilization of Federal Disaster Medical Assistance Teams (DMAT)        and Mortuary Teams (DMORT).    -   Provides for training, by Communities of Interest, of health        organizations that deliver care.    -   Communicates emergency instructions, prevention, control and        treatment information.    -   Helps resolve legal issues related to public health authority in        emergencies.

FIG. 2 is an overall diagram of a global disease surveillance platform(GDSP™) 100 as it relates to various government and non-governmentagencies in the public health and bio-terrorism arenas. The GDSP™ 100exists as part of a global disease surveillance environment 10, andincludes enterprise service bus 105 and portal 110, through whichprocessing components of the GDSP™ 100 are accessed; and global diseaseinformation repository 120, where critical data needed to operate theGDSP™ 100 and to provide the functionality listed above may reside.Coupled to the GDSP™ 100 are data sources 130, which provide thecritical public health data consumed by the GDSP™ 100, and remote users140, who access the GDSP™ 100 through secure path 111 or unsecure path113, to gain access to the data and products of the GDSP™ 100; GDSP™partners 150, which provide the data sources 130, and which receiveoutputs from the GDSP™ 100; and other data sources 170, such as mediaservices, emails, and blogs. The various remote users 140, and the GDSP™partners 150, may be linked together over a data network, such as theInternet 160, for example. The users 140 and the partners 150 mayinteract with the GDSP™ 100 through queries, by subscription, on atransactional basis, and/or through multi-party collaboration hostedwithin the GDSP™ 100.

The data sources 130 and 170 may include any data source capable oftransmitting digital information. Examples of such data sources includeSQL data, SQL data via JDBC, flat files, XML, XML Web ServicesDescription Language (WSDL) files, and ANSI EDI files; email, RSS feeds,web service WSDL enabled applications; and SQL data sources. One ofordinary skill in the art will recognize that many other types of datasources may communicate and work with the GDSP™ 100. The data sources130 and 170 may be maintained at one or more external partners 150 inthe system 10. Access to the data sources 130 and 170 may be permittedunder an agreement between an external partner 150 and the GDSP™operators. Other data sources 130, 170 may be freely accessed over theInternet 160. Data in the data sources 130 may be structured, and may becompatible with the schema employed by the GDSP™ 100. Alternatively, thedata may be unstructured, and may require mapping to the schema used byGDSP™ 100. Here, unstructured data refers to masses of (usually)computerized information that do not have a data structure which iseasily readable by a machine. Examples of unstructured data may includeaudio, video and unstructured text such as the body of an email or wordprocessor document. The data in the data sources 170 typically will beunstructured.

The data sources 130 include external partner data feeds. The externalpartner data feeds may be provided electronically in digital formatexpressed as spread sheets, XML documents, CSV documents, email, RSSfeeds, and SQL queries, for example. The external partner data feeds maybe provided periodically, on-demand, or a combination of periodicallyand on-demand. The external partner data feeds may include medical data,patient data, environmental data, hospital utilization data, and anyother data needed to monitor and control public health. The externalpartner data feeds are provided to the GDSP™ 100, and may be stored intheir original format in external system databases or in the datarepository 120, awaiting processing in the GDSP™ 100. Unstructured dataderived from the external partner feeds are processed, tagged withmeta-data, indexed, and linked to similar content. Structured data aremapped to the GDSP™ schema using components of the GDSP™ 100.

FIGS. 3A-3C are diagrams of an architectural plan of the GDSP™ 100. FIG.3A is an overall block diagram of an architectural plan 200 of the GDSP™100, showing selected components thereof. The architecture 200 may beinstalled on a networked server, which may be accessible by othernetwork devices. Alternatively, elements of the architecture 200 may beinstalled on other network devices, or local terminals, that are coupledto the networked server. The other network devices may use thearchitecture 200 to obtain various views of the GDSP™ process. The othernetwork devices may include a personal computer or a handheld device,for example, and an operator (i.e., human) of the personal computer orthe handheld device may use the architecture 200 to obtain a desiredview (e.g., avian flu vaccine shipment status) of the GDSP™ process.Another network device may query the architecture 200, without directhuman direction or intervention, to obtain information related to theGDSP™ process, for example, by using RSS feeds or a REST API.

The architecture 200 includes components that serve as means forinterfacing with external feeds 230 and crawler 236 to access data fromthese data sources, translating the data into a schema used in thearchitecture 200, and formulating and executing queries of the data. Thearchitecture 200 includes means for the mapping data sources into theGDSP™ schema. Also included in the architecture 200 are means forproviding security for transactions involving the external feeds 230.The architecture 200 further includes means for controlling messagingbetween the architecture 200 and the external feeds 230. Thearchitecture 200 still further includes means for viewing, analyzing,processing, and storing data from the data sources. Finally, thearchitecture 200 includes means for executing queries of the processeddata from the data sources, as well as the raw data contained in thedata sources.

An Enterprise Service Bus (ESB) 220 forms the backbone of the GDSP™architecture 200. The ESB 220 provides an abstraction layer for messagerouting, transaction management and application integration, and couplesa GDSP™ store 280, the external feeds 230 and the crawler 236, externalservices module 290, and processing components 250. The processingcomponents 250 also receive inputs from situational awareness module240, and directly from the external services module 290 (translationservices module 297, place location services module 295, trafficservices module 293, and geo-location services module 291). Finally, theprocessing components 250 may be accessed through a browser 295, whichmay be a standard Internet browser residing on a computing platform ofone of the remote users 140.

Data acquisition services are a key element of the GDSP™ 100. To accessthe information from the external feeds 230 on a real-time, on-demandbasis, the architecture 200 may be used to determine a schema related todata from each of the data sources 232, 234, 236 and to map the data toa schema within the architecture 200. To accommodate this mapping, thearchitecture 200 includes data acquisition, evaluation, andsynchronization functions. These functions may be realized by use of anestablished schema, for example, to which the data in one or more of thedata sources 232, 234, 236 are mapped. More specifically, dataharvesting components 222 (transformation), 224 (classification), 226(ontology), and 228 (persistence) within the ESB 220 are used to extractdata from structured (232) and unstructured (234, 236) data supplied bythe external feeds 230 and the crawler 236, and then pass the harvesteddata to the processing components 250 of the GDSP™ architecture 200. TheGDSP™ 100 can harvest data using pull or push services. Pull servicesrequire the GDSP™ 100 to periodically initiate data access functionswhile with push services, the GDSP™ 100 passively waits for incominginformation. In both cases, once a data target has been identified, thedata target will be transformed (harvesting component 222) and routed tothe appropriate GDSP™ components 224, 226, and 228 for furtherprocessing. Each component 222, 224, 226, and 228 is deployed as aplug-in. Since each data feed will have unique characteristics, newplug-ins can be registered into the GDSP™ architecture 200, therebyproviding data flexibility, customization and independence. Both theharvested data and the original (raw) data may be persisted into theGDSP™ store 280.

For unstructured information like free-text or derived reports, theGDSP™ 100 mines the Web (both publicly accessible sites as well aspartner sites that require authenticated access) based on predefinedalgorithms/set of rules for standard key words, such as those in theUnified Medical Language System (UMLS), and/or a particular concept.Once a data source is found, that source's raw information will becached and stored in the GDSP™ 100 with a reference link mapping to thedata source; categorization of the data is also applied to the raw databased on pre-defined ontology services 228. Classification service 224further assesses the information based on discovered relationship(s)with other concepts or documents using a real-time scoring algorithm aspart of the ESB 220. For example, mining ProMED listserv email for H5N1Avian influenza/bird Flu, a GDSP™ service agent (not shown) will scanevery email looking for UMLS keywords. Once a target data source isacquired by the ESB 220, the data source is cached and persisted (228)in the GDSP™ store 280, and this process continues for all the targetdocuments. Parallel to this process, each acquired document will betagged/indexed with one or more predefined categories in 226 for furthercorrelation. A GDSP™ user on the front end (i.e., at 140/150—see FIG. 2)may access this categorized and cached information using the portal 260.

For semi- or highly structured information 232 the GDSP™ 100 may importthe information into a predefined table schema. For example, WHO avianflu reports are semi-structured with information on location, gender,pathogen type, route of transmission, total fatalities and cases, etc.,which can be scrubbed and loaded into a predefined table schema in theGDSP™ store 280.

As noted above, the external feeds 230 can supply structured data 232and unstructured data 234 to the GDSP™ 100. The GDSP™ 100 may also use adata acquisition device, such as the crawler 236, to access the Internetin search of various (mostly unstructured) data sources. The crawler 236operates on a continuous basis. The crawler 236 is programmed to searchfor data sources related to public health and bio-terrorism, world wide.Such programming may include use of key words, including the UMLS, forexample. The crawler 236 may also be programmed to “learn” new searchcriteria. For example, the crawler 236 may return an unstructured datasource based on UMLS key word searching. The crawler 236 may identifyother terms in the data source, and may use these new terms forsubsequent Internet searching. Alternatively, the crawler 236 receives“feedback” from the ESB 220, such as meta-data extracted from thesources 232, 234, and uses the feedback as a basis for future searching.For example, if the meta-data from an information source includes aspecific Web address, the crawler 236 may look for all further datasources having the same Web address. Other algorithms may beincorporated into the crawler to facilitate comprehensive and efficientInternet searches.

Data fusion components allow the GDSP™ 100 to automatically analyzestructure and unstructured data in 230 and link similar data together.Data fusion components also allow the GDSP™ user/analyst to effectivelyreview a wide range of data sources. Some aspects of data fusion residein the ESB 220 (information sources 232, 234 feeding into ESB components222, 224, 226, and 228), while others reside in fusion module 265, whichwill be described in conjunction with FIG. 3B. Data fusion provides aholistic view into the data platform and gives the GDSP™ user anopportunity to review and consume a variety of data points. Data fusioncomponents within the ESB 220 provide more comprehensive picture of thedata by merging information from disparate sources (the external feeds230, for example) despite differing conceptual, contextual andtypographical representations as means for consolidating data fromstructured or unstructured resources. For example, a ProMED email alertconcerning avian flu in Thailand might consist of free text withparameters on the location, transmission route, age, etc. of a suspectcase in one region. At the same time a video stream as well as newsfeeds from the media and blogs may report the incident. Looking at thesesources of information separately may not alert an analyst, or cause anautomated alert; however, when these sources of information are puttogether, using the ESB 220 components, the “fused” information sourcesmay point to the single location (as processed by configurablegeo-coding service 291) in one point in time. Such a congruence of datafeeds could indicate an anomaly, which could trigger an automated publichealth alert (alert module 262), and/or which would help the GDSP™decision maker/analyst gain a cohesive picture of the threat and be ableto orchestrate a coordinated response using GDSP™ tools. (Examples ofsuch tools are those provided in situational awareness and responseservices component 240, including WebTAS 241, Google Maps 243, Yahoomaps 245, Google Earth 247, Visual Earth 249, and the collaborationservice 267.) In addition, the ESB 220 brokers requests between theGDSP™ 100 and external services 290, such as Place Location Services 295(e.g., nearby hospitals, fire departments, pharmacies, police stations,hospital capacity, etc.), by accessing Yahoo services using a REST API.

The ESB 220 also allows subscribers in 140 and 150 (see FIG. 2) toautomatically receive information using Web services component 254(available via a REST API, RSS), or to manually extract data using dataexport component 266. This enables further analysis by GDSP™ usersemploying their own classification and analysis tools beyond what isprovided in the GDSP™ architecture 200.

Once transformed into the data schema used by the architecture 200, thedata are classified, using classification component 226, which adds, forexample, meta-data tags to each data point, indicating how the data maybe used, its “shelf life,” access requirements, and other information.The classified data are then passed to the persistence component 228 foreventual storage in the GDSP™ store 280.

Within the processor 250, the XML utility 256 is used to read an XMLdocument, or data source, in one format, and transform that documentinto another XML formatted document as well as to provide the ability toquery information in an XML source. For example, an external service in240 or 290, like Yahoo traffic, Yahoo places, or Yahoo geo-coders,provides information in a specific XML format, which then needs to betransformed by XML utility 256 using, for example, the APACHE digester(based on certain rules) into a list of traffic objects (traffic POJOclass). The traffic POJO list is subsequently fed into a transformationobject in the XML utility 256 that will transform the JAVA object intoan XML stream compatible with the maps in Google Maps component 243.Upon completion of this transformation, the Google XML formatted data issent back to the browser 295 and used by an API within the Google Mapcomponent 243 to overlay markers on a map. For example, this process canbe used to display nearby hospitals and their capacity at a certainlocation where a potential public health event exists.

The various services that constitute the data sources 230 may includesecurity measures to, for example, limit access to data and processesused by the services. For example, an external partner 150 may use anapplication that incorporates various security measures. Thearchitecture 200 may use these security measures when managing access todata from the external partner's data source. Alternatively, thearchitecture 200 may provide security measures. For example, thesecurity adapter 263 may limit access to query data from a specific datasource to only those individuals or machines that possess a specificpassword and log-on name. The security adapter 263 may establishrole-based access such that, for example, an organization's managerswould be able to access certain medicinal data, but would not be able toaccess certain patient data, which could be restricted to theorganization's medical services personnel. The security adapter 263 canimplement access restrictions based on a user's identification as a“normal user” or as a “system administrator.” The security adaptor 263also supports multiple clients and multiple projects within a client.

The GDSP™ 100 provides for automated and manual (i.e., human) detectionand analysis of a potential public health event. To execute thisfunction, the GDSP™ includes various algorithms (algorithm component 270shown in FIG. 3A) that can be applied (automatically or manually) atdifferent points over the life cycle (i.e., from outbreak totermination) of the potential public health event, depending on thecharacteristics of the event. Some of the most useful algorithmicapproaches involve multivariate and univariate time series algorithms,which include CuSum with EWMA, recursive least squares, Wavelet, andsimple moving average. In addition, the GDSP™ 100 uses Bayesian analysisas a means to provide early disease detection. Bayesian analysiscomputes the probability that an event such as an outbreak is takingplace based on related information that is evolving in time. Analysishas shown that outbreak detection is more reliable when severaldifferent factors increase together, even if none of the factorsindividually exceeds a particular response threshold, because when onlya single factor “spikes,” that factor often represents only outlierdata. As used in the GDSP™ 100, Bayesian analysis can routinely fuseheterogeneous data by discovering and quantifying the hiddenrelationships among the data. This also allows the GDSP™ 100 to createand deploy increasingly sophisticated algorithms that take otheralgorithms as input.

Another tool used in the GDSP™ 100 is cluster detection. A cluster is anincreased density (incidence) of cases in time or space.

The time series algorithms used in the GDSP™ 100 provide associated datacaching schemes for both data and graphs that serve a large number ofsimultaneous Web-based users, each of whom simultaneously may requestmultiple graphic displays at a time. These algorithms are describedbelow:

1. CuSum-EWMA: CuSum is a class of algorithms that can detect gradualchanges in the mean and/or standard deviation of a time series byforming cumulative sums from the prediction errors. CuSum implementationuses an exponentially weighted moving average (EWMA) to predict the nextvalue in a time series. As implemented in the GDSP™ 100, auser-specified threshold value and a standard CuSum procedure on theforecast errors are used to determine whether to generate a publichealth alert. The value of the threshold line is computed by calculatingthe minimum value that would induce an alert under the above CuSumprocedure. The algorithm generates an alert when the cumulative sumexceeds the threshold.2. Moving Average: The Moving Average algorithm predicts the next valueto be the average of the previous [W] values in the time series, where[W] is the window size. The prediction error is computed by subtractingthe predicted value of the time series from the observed value. Asimplemented in the GDSP™ 100, the algorithm generates a public healthalert when the prediction error exceeds a threshold based uponhistorical data. The value of the threshold line on day [d] of a timeseries is determined by first computing a forecast for day [d] byaveraging the data for a preceding period (e.g., for the preceding 30days). Then, historical forecasts for the 90 days (assuming use of thepreceding 30-day period) that precede day [d] are computed using a30-day average for each forecast. Next, the historical forecast errorsare computed by subtracting the forecast from the actual value. Finally,the value of the threshold line on day [d] is computed.3. Recursive Least Squares: The Recursive Least Squares (RLS) algorithmuses linear regression to construct a forecast for day [d] of a timeseries. The regression model is similar to an Auto Regressive IntegratedMoving Average (ARIMA) model that incorporates auto-regression, 7-daydifferencing, and a 7-day moving average to produce forecasts.Historical forecast errors are computed by subtracting the forecast fromthe actual value, and computing the threshold line on day [d]. Asimplemented in the GDSP™ 100, the RLS generates an alert when theprediction error exceeds a threshold based on the historical data.4. Wavelet: The wavelet-based anomaly detector (WAD) is designed todetect abrupt changes in a time series by using the wavelet transform toremove short and long-term trends from the time series. The resultingsmoothed time series are used to produce forecasts. Historical forecasterrors are computed by subtracting the forecast from the actual value,and computing the threshold line on day [d]. As implemented in the GDSP™100, the WAD generates an alert when the wavelet prediction errorexceeds a threshold based on historical data.5. Bayesian Spatial Scan Statistic Algorithm: As implemented in theGDSP™ 100, a spatial scan statistic (SSS) algorithm searches ageographic region [R] for a subregion [S] that has an unexpectedly highcount of some quantity of interest. One such quantity would be thenumber of reported cases of salmonella food poisoning by location (e.g.,by zip code). The search is performed over shapes of a particular type,such as circles, ellipses, or rectangles; for a given type of shape,many sizes of that shape are considered. By implementing abranch-and-bound search technique, the normal time to find the subregion[S*] that is most likely to contain an outbreak will decrease by about afactor of 1000. When executed, this Bayesian algorithm is about onemillion times faster than conventional algorithms that perform acorresponding outbreak detection task.

Additional algorithms may be included, such as the Bayesian AerosolRelease Detect (BARD) algorithm developed by the University of PittsburgRODS laboratory, to support criminal and epidemiological investigations.This algorithm type supports event reconstruction and analysis, which,particularly in a bio-terrorism scenario, may assist law enforcement (aswell as epidemiological) investigators catch the event perpetratorsbefore they can strike again. This model analyzes emergency roomcomplaints and meteorological data to compute a posterior probabilityof, for example, anthrax release, release time, release location,release quantity, and number of affected individuals. The model is akinto an “inverse plume” model, in that the model can take casualty numberand location data and calculate the approximate time, place and amountof a deliberate infectious but non-contagious aerosol pathogen release.The model combines the Gaussian atmospheric dispersion model with aBayesian network. The Bayesian network represents knowledge about therelationship between observable biosurveillance data, diseaseparameters, and exposure to aerosolized biological agents. The methodcan also be applied to other types of biosurveillance data includingresults from BioWatch monitoring (an early warning program intended todetect the release of biological agents within 36 hours of theirrelease).

Furthermore, capturing normal behavior traits using mathematicalmethodology establishes patterns that, when violated, may indicateanomalous behavior. Belief Networks (BNs) develop a context sensitivecharacterization of normal and abnormal activity and provide aprobabilistic assessment, with the understanding that some falsepositives are generated, in order to ensure that true threats are notoverlooked. To meet this objective, the GDSP™ 100 exploits strengths ofthe ability to support hybrid BNs that fuse ensembles of Bayesian BNs,Dempster-Shafer BNs, and other probabilistic reasoning machinery toprocess observations in the context of knowledge. The result is aprobabilistically ranked threat list that is used to search for newhypotheses and to task for the “best next observations” to explainanomalous behaviors.

Finally, the GDSP™ 100 allows analysts the ability to codify theirheuristic “rules of thumb” as detection algorithms, which can becaptured in the logic of a commercial business rules engine product.These “rules of thumb” can identify potential threats that are bestcharacterized by logical conditions rather than mathematic analysis.

FIG. 3B is a block diagram of the architectural features of fusionmodule 265 and collaboration module 267. As shown in FIG. 3B, the fusionmodule 265 receives data from the ESB 220 data from the sources 232,234, 236—see FIG. 3A) and the browser 295 (by way of portal 260—see FIG.3A). The fusion module 265 may also receive inputs from the algorithmsmodule 270. If the received data are not already converted/transformedto the schema used by the GDSP™ 100, that processing takes place insub-module 281. Furthermore, unstructured data (e.g., email) isconverted to a consistent format used by the GDSP™ 100. Such conversionmay be executed by a translation algorithm that comprises the sub-module281. Next, the data are analyzed (if not already completed) byanalyzer/scorer sub-module 283 according to a set of criteria such asthe presence of specific key words that are universally recognized aspertaining to a specific public health event (e.g., avian flu; anthrax;e. coli). The data are also scored based on the relevancy and accuracyof the information contained in the specific data source. For example, apathology report from an accredited hospital may score higher, and maybe considered more accurate, than a news report from a media outlet.Furthermore, the same pathology report would likely be considered morerelevant to the determination of a public health event than would ageneral news article about the same public health event, in that thepathology report contains more directly pertinent and specific data andrepresents the real-time observations of a public health professional,while a news report is generally a distillation of facts written andtargeted to appeal to persons of limited education. Scoring algorithmswithin the sub-module 283 are able to discern inconsistent “facts”stated in an information source: for example, an incorrectly statedpathway for a biological agent may cause the supplying informationsource to scored with a lower accuracy than if the pathway was correctlystated.

After analysis and scoring, the information is processed by tag/indexsub-module 285 that adds, if not already present, temporal andgeo-spatial information to the information. The sub-module 285 alsoassigns an index number, if not already assigned, which serves toidentify the processed information and as a reference to the original,unprocessed information source. The tagged and indexed information isstored in the GDSP™ store 280. The processor sub-module 287 receives theindexed and tagged data, along with the score assigned to the data. Atriggering algorithm in the sub-module 287 determines if the data shouldreceive an analyst's review, and the urgency of that review. Forexample, if the score exceeds an alerting threshold, the sub-module 287may flag the data for human review, and may send the data (by, forexample, email) to one or more GDSP™ analysts. Alternatively, just thedata's index may be sent. The processor sub-module 287 also compares themeta-data extracted from the information source, and determines if theinformation source relates to an existing public health event, or shouldbe assigned to a new public health event. If the information source isto be assigned to an existing public health event, the data's index maybe appended to indicate the identity of the appropriate, existing publichealth event. If the information source does not appear to relate to anexisting public health event, then one of two steps is completed. If theinformation source is scored sufficiently highly, then a new,provisional, public health event may be created, and the informationsource appended with a corresponding event identifier. If theinformation source does not score high enough, the information sourcemay be placed in a holding register, awaiting the receipt of additionalinformation sources that appear to relate to a common event.

Following processing by the fusion module 265, the data is next routedto collaboration module 267, where the data is made available to GDSP™users. In one embodiment, the data is presented to a virtual meetingroom, using meeting room coordinator sub-module 282. More specifically,data from a specific information source that is identified as relatingto an existing public health event is provided to a meeting roomestablished in the GDSP™ 100 to help manage the response to that event.Once the analyzed, scored, tagged, indexed, and identified data isprovided to the meeting room, that data is available to any GDSP™ userwho is able to access the specific meeting room (meeting rooms may bepassword protected, for example). Following assignment to a meeting room(if appropriate), the data is processed by aggregator sub-module 284,which compiles all the related data into a single file for eventualstorage (sub-module 286) in the GDSP™ store 280. Note that GDSP™ usersmay introduce new data sources into the meeting room. The aggregatorsub-module 284 processes this new data so that it is properly identifiedwith the other data assigned to the meeting room. The aggregatorsub-module 284 may also provide this additional information to thefusion module 265 for analysis and scoring. Additionally, the GDSP™users may perform various analyses, write notes or comments, orotherwise interact with the data assigned to the meeting room. Theaggregator sub-module 284 ensures that any of these data elements areproperly related, and stored with other data related to the specificpublic health event. In the case in which a meeting room is notestablished for a public health event, the aggregator sub-module 284ensures that all related data are properly identified and stored in acommon file.

FIG. 3C shows selected architectural features of message broker 255, andits connection to other components of the architecture 200. The messagebroker 255 receives inputs from a variety of components, including, butnot limited to the ESB 220. For example, the ESB 220 may provide aninitial alert notice, based on the processed data from a specificinformation source, that a public health event may exist. Message datasynchronization sub-module 257 compares this alert with other automatedor manual alerts to determine if a new notification, or alert, isjustified. For example, an alert received at 1 p.m. EST may simplyduplicate information contained in an alert received five minutesearlier. Alternatively, the message broker 255 may be processing anoutgoing alert at the same time as a new alert is received. Rather thanduplicating alert notifications, the sub-module 257 simply combines anyinformation from outstanding alerts so that a single notificationissues. Messaging algorithm 259 provides a triage function, or otherfunction, so that the highest priority alert notification addressees arenotified first (note that the notifications may be provided to humanusers and to other computer systems, news media, etc.). Supervisor andcontrol sub-module 253 determines the mode(s) of notification, such asemail, automated telephone call, or both, for example. For call andemails, multiple addresses may be used. The sub-module 253 may alsomonitor the communications path for a read response. If such a readresponse is not received within a specified time, the sub-module 253 mayemploy other means to communicate with the designated individual orsystem. For example, failure to get a read back from a primaryindividual may cause the sub-module 253 to issue a message to asecondary individual stating that the primary individual has not beennotified.

FIG. 4 is a conceptual model of the GDSP™ functions. Starting with thefunction, “Capture Information,” and working clockwise, provides anapproximate temporal relationship among the functions. Thus, once theGDSP™ 100 captures information, that information is processed andindexed, analyzed, investigated, and archived. Along the way, analysisof the information may lead to alerts that a potential public healththreat has been identified and detected.

FIGS. 5A-5E are flowcharts illustrating exemplary GDSP™ processes.Because of the data intensive nature of these processes, one or morecomputing devices, suitably programmed to execute GDSP™ code, are usedto complete the processes.

FIG. 5A is a flowchart illustrating an overall GDSP™ process 600. Theprocess 600 begins with block 601, wherein public health information isavailable to the GDSP™ 100 shown in FIG. 2. In block 610, informationfrom various sources 150, 170 arrives at the GDSP™ 100, and the processof data entry and analysis begins in the ESB 220 and the processingcomponents 250 of the GDSP™ architecture 200 shown in FIG. 3A. Theinformation sources include other surveillance and biosurveillancesystems as well as non-surveillance systems such as clinical systems,lab systems, reporting systems, media reports, blogs, articles, andother sources. Some of these sources may push information to the GDSP™100; for example the GDSP™ 100 may subscribe to certain informationsources. Other sources may be queried by agents of the GDSP™ 100, andmay provide information in response to these queries. In general, theGDSP™ 100 receives information on a regular, periodic basis.

Next, in block 620, the GDSP™ 100 processes and indexes the incominginformation. Since incoming information is received on a regular,periodic basis, the processing and indexing of block 620 also occurs ona regular, periodic basis. However, the GDSP™ 100 may buffer certainincoming information before executing block 620. In processing andindexing the incoming information, the GDSP™ 100 develops a consistentset of meta-data to describe each information source, or document, andto allow indexing the derived meta-data to the entire information sourceor document. Thus, not only is a database of meta-data created, but theentries in the meta-data database are indexed to the originalinformation source or document, and the original information source ordocument is later archived in such a manner as to be retrievable forlater review and analysis if needed. The result of the processing andindexing is a consistent set of meta-data that the GDSP™ algorithms canuse to identify anomalies, such as outbreaks, pandemics, epidemics, orbio-terrorism acts, for example, and to support the investigation, byGDSP™ users, of potential threats.

To develop the consistent set of meta-data, unstructured data such asemail, blogs, and RSS feeds are parsed by unstructured text and naturallanguage processors to extract the meta-data, and may be broken downinto smaller individual event reports for clarity. Then the extractedmeta-data are then tagged to indicate data quality.

Once tagged, the meta-data may be routed, blocks 622, 623 to systemadministrators for manual processing. The decision to route for manualprocessing may be based on the content of the meta-data (e.g., thesource, time and date), a possible relationship to an existing publichealth event, or all information is routed for manual processing, forexample. Video and audio media is initially translated into text andthen processed in the same manner as unstructured data.

Finally, once the meta-data are extracted, agents within the GDSP™ 100may review the meta-data, and, based on thresholding algorithms, providealerts, blocks 632, 633 to specific portions of the GDSP™ 100, tosystems linked to the GDSP™ 100, and to individual GDSP™ users (e.g.,the users 140—see FIG. 2).

In block 700, the GDSP™ 100 is used to determine if a public healthevent exists, or to update an existing event. That is, the processeddata may indicate a new potential public health threat, or may relate toan existing potential or established (declared) public health event.Processing to determine the existence of such an event is described indetail with respect to FIGS. 5B and 5C. If a public health event isdeemed to exist, or if the status of the event has changed by asignificant enough amount, the process 600 proceeds to block 900, and apublic health decision is rendered. This decision has both automated andmanual (i.e., human intervention) aspects: specifically, automated andmanual event notifications. One additional aspect of the processingassociated with block 900 is to seek additional information beforeconcluding that a public health event definitely exists, block 920. Theprocessing associated with block 900 will be described in detail withrespect to FIG. 5D. If a public health event exists, or has changedsufficiently, then the GDSP™ 100 supports various communications andresponse actions, block 1000. Should the declared public health eventcontinue to exist, then the process 600 returns to block 610, andadditional information related to the ongoing event is captured. Usingmany of the same functions of the initial public health decision process900, effectiveness of response actions are monitored and evaluated withpossible new response recommendations being generated. Finally, once thecommunications and response processing has been completed, and thepublic health threat eliminated, or at least substantially reduced, thevarious information, analysis, and reports are archived, and then theprocessing ends, block 1100, as to that specific public health event.

Returning to the processing associated with block 700, FIG. 5B shows theoverall process associated with determining if a public health eventexists. In block 710, the meta-data extracted from the input informationis processed in the GDSP™ 100 using various detection algorithms forautomated anomaly detection. Such detection algorithms may includesearches for keywords, relationships between location of a data sourceand the time and date of the reported source, whether reported eventinvolves a human patient, number of human patients involved, etc. Thedetection algorithms may be pre-existing, or may be newly generated ormodifications of an existing algorithm, depending on the nature of theprocessed meta-data, the underlying meta-data source, and the decisionsof the GDSP™ administrators.

If the detection algorithms indicate an anomaly exists, block 750, thenin block 760, notifications may be sent to associated systems and humanusers of the GDSP™ 100. These notifications may be made widely availableto encourage general investigation and assessment, but are precursors toofficial GDSP™ alerts, which, as will be described later, requirevalidation and authorization from an accredited GDSP™ user.

Associated with the process of analyzing the input meta-data areprocesses to Maintain reference information, block 770, and to set up ananalysis cycle, block 780. In maintaining reference information, block770, the GDSP™ 100 provides users with reference information, such asdisease(s)/conditions(s) of interest lists, disease indicators, datasets to include in the analysis, and terminology lists and mappings.This process includes the set up of modeling parameters used asreference information for block 710 in both a default mode and auser-controlled mode. That is, for example, the automated processes ofblock 710 may execute using default parameters, which a GDSP™ user maychoose to override.

In block 780, the GDSP™ 100 may use a standard interface(s) to set theparameters of the analysis cycle (e.g., in block 710) for various phasesof a public health event. This process also provides the ability to setthresholds that serve as the dividing line between various public healthevent phases (e.g., disease outbreak and disease spread).

Whether or not an anomaly is determined at block 750, processingproceeds to block 800, wherein the potential for a public health eventis evaluated.

FIG. 5C illustrates the steps associated with the process of detectingand investigating a potential public health event, block 800. In FIG.5C, some steps may be performed manually, some automatically, and somemay be both automated and manual. In block 810, the GDSP™ 100 appliesautomated detection algorithms to the input information to determine ifan anomaly exists. The processing in block 810 may thus be viewed as acontinuation of the automated anomaly detection process executed inblock 710. If an anomaly is detected, block 812, then the process 800moves to block 820. Note that the anomaly detection, block 812, is inaddition to the anomaly detection of step 750. This additional anomalydetection takes account of the fact that additional and “fused”information sources may present a somewhat different picture of thethreat to public health, and also the fact that GDSP users/analysts mayadd information/analyses to that already associated with the publichealth event at issue (for example, through use of the collaborativeteam room). If no automated anomaly is detected, processing may returnto block 810, and as additional meta-data are provided, the automateddetection algorithms are re-executed. In block 820, if notifications ofthe anomaly have not already been made, then such notifications aresent.

Either in parallel with the automated anomaly detection processing andnotification of blocks 810-820, or following the notification of block820, the process 800 may move to block 830, and a GDSP™ user/analystbegins a detailed investigation into the potential public health event.The data review of block 830 may involve opening or establishing acollaborative team room (i.e., a virtual meeting room) in whichappropriate GDSP™ users can view data associated with the potentialpublic health event, contribute analyzes, and provide additionalinformation. The team room may be made to persist from the creation ofthe team room until its associated (potential or actual) public healthevent is over. Using the team room, and various tools (e.g., globalmappings, geospatial and temporal graphing devices, data mining,reporting mechanisms, security mechanisms, and various detection andanalysis algorithms) provided by the GDSP™ 100, the GDSP™ user/analystcan organize information into a single coherent picture and providesituational awareness and insight into the public health event.Furthermore, the GDSP™ users (data providers and/or data consumers) canuse this team room throughout the life cycle of the public health event.

After an initial analysis phase, the process 800 moves to block 832, andthe GDSP™ user/analyst determines if more information will be needed(which, generally, would be the case). If more information is needed,the GDSP™ user/analyst may communicate that need (block 834) using anyconventional means, including emails, telephone calls, etc. The process800 then returns to block 830. If the initial information provided atblock 830 is sufficient, the process 800 moves to block 840, and theGDSP™ user/analyst reviews the various detection/analysis algorithmsprovided in the GDSP™ 100. If the GDSP™ user/analyst determines that oneor more of the algorithms are acceptable, the GDSP™ user/analyst mayapply the algorithms (block 850) to the input data associated with thepotential public health event. If necessary (block 842), the GDSP™user/analyst may modify an existing algorithm or generate a newalgorithm (block 844). New or modified algorithms may be registered withthe GDSP™ 100, but their use may be conditional until verified by anaccredited GDSP™ user. Finally, following application of the algorithms,the GDSP™ user/analyst may determine (block 860) that more data areneeded before proceeding to process 900, and thus the process 800 mayreturn to block 830.

Associated with the process of detecting and analyzing a potentialpublic health event is process step 870, establishing relationshipsamong the information sources. This process step 870 enables GDSP™users/analysts to manually connect information sources in the GDSP™repository 120. This then enables all users to begin to see andcommunicate (collaborate) about emerging public health threats; forexample a news report about the status of a major flood in India and therelationship between this information and that of the spread of a viralinfection among the population at risk of this public health threat.This process step 870 automatically captures details about the user whoestablishes a relationship (or set of relationships) and permits theuser to add notes about the relationship. These relationships can beviewed using relationship visualization services. In addition to themanual process, an automated process could be configured upon trainingof the system over time.

While the steps 830-870 described above are implemented in acomputer-aided fashion, in other embodiments, specific steps (e.g.,block 870) or all steps may be automated in the GDSP™ 100.

FIG. 5D illustrates the public health decision process 900. The process900 begins, block 905 when the GDSP™ 100 automatically, or a GDSP™user/analyst manually, or through a combination of manual and automaticprocesses, provides notification of a public health event to theappropriate personnel and systems. Various communication methodssupported include, but not limited to, email, pager, and telephone. Inaddition to providing the alert, GDSP™ 100 contains either a list ofpeople, organizations, and systems to be notified and/or uses theresources of an Alerting system that is external to GDSP™ 100. The GDSP™100 includes a triage algorithm for an alerting and reporting system aswell as the contact information, and information requests for a diverseinternational population.

In the alert process 905 the automated mechanism by which algorithms canbe triggered to dispatch and route alerts uses the definition ofinterest area and assignment of priority. The process 905 also works inconjunction with workflow services to ensure that mandatory polices forrelease and escalation are observed. The GDSP™ 100 generates alerts whenalgorithms flag potential health events. An alerting service dispatchesalerts to those analysts and organizations who have registered aninterest and are accordingly authorized. Workflow policies can bedefined within GDSP™ 100 to provide mandatory policy rules for reviewbefore release of alerts, and escalation procedures if alerts are notacknowledged in time. Similarly, mandatory policy rules can determinewhether alerts are sent prior to being characterized by an analyst.Depending on the output of the algorithm, alerts may be “packaged” withreports or other supplementary data that provide the justification forthe alert. Similarly, the alert workflow capability will support reviewand release of information in a multi-level secure environment.

In block 910, the provided alert is verified. If the alert is credible,the public health event may be characterized in terms of the following:

1. Determine biological agent2. Route of transmission3. Source (e.g., release point)4. Number of individuals affectedThe characterization will be refined over time, as is demonstrated inthe case of the July 1976 outbreak of Legionnaire's disease inPhiladelphia illustrated in Table 1.

TABLE 1 Before and After Epidemiological Diagnoses for the July 1976Outbreak of Legionnaire's Disease in Philadelphia Initialepidemiological Final epidemiological “working” diagnosis diagnosis (sixmonths later) Outbreak exists = true Outbreak exists = true Biologicalagent = the Biological agent = L. pneumophilia differential diagnosis ofSource = water cooling tower, infectious pneumonia Bellevue StratfordHotel Source = ? Route of transmission = air Route of transmission = Setof affected individuals = 180 cases probably air Set of affectedindividuals >= 8 cases.

An associated subscription process provides GDSP™ users/analysts with aset of services that automatically disseminate data. Rather than a usermanually looking for information of interest on a periodic basis, thesubscription process enables a personalized set of agents constantlylooking for information, then generating a notification to the user thatdata of interest is available, and/or pushing the data to the user. Thiscan be done, for example, using a REST API, RSS, or by manualextraction. The increases a GDSP™ user's productivity through theelimination of constant manual “polling” for data. This process alsoprovides the architectural underpinning for supporting collaborativecommunities of interest, as well as bidirectional interactions withother sector-specific agencies.

The GDSP™ 100 supports (block 915) response planning and public healthevent monitoring by managing critical information about confirmedevents, such as outbreaks, and communications between internationalpublic health professionals for informing actions to limit the spread ofthe outbreak and mitigate the health, social, and economic impacts of apandemic. This process provides:

-   -   The use of models (see Analyze Input Information) to make        informed inferences about disease spread as the event and event        response progresses. These models provide insight into which        control strategies might be effective in slowing spread.    -   Assist public health and response authorities with the        implementation of travel-related and community containment        measures through the use of interactive maps that are linked to        data about quarantine areas, school and airport locations and        closings.    -   The GDSP™ 100 can be used to assess the capacity of state and        local medical and emergency response systems to meet expected        need during a public health threat event. The GDSP™ 100 can also        be used to track the availability and location of personnel,        areas with patient visit surges, and the beds within healthcare        facilities.    -   The GDSP™ 100 can facilitate and manage the supply of essential        materials to event response sites, transport of laboratory        specimens from the field to appropriate diagnostic facilities,        the organization of treatment (vaccination) programs, or        deployment of teams for disease control.    -   Using the notification ability, the GDSP™ 100 can allow public        health and response authorities with the ability to request        assistance from U.S. federal teams including the Commissioned        Corps and Medical Reserve Corps as well as those making ready        Federal Medical Contingency Stations.    -   The GDSP™ 100 can facilitate the aggregation and communication        of speedy treatment effectiveness studies and reports of adverse        events following treatments including substance (vaccine,        antivirals, etc.) administrations and dispensations.

Finally, in block 920, the GDSP™ 100 provides for information onanomalies detected, decisions made, and actions taken to be archived bythe GDSP™ 100. This information is capable of being queried during theevent and afterwards for evaluation purposes. Post event-evaluation hasmuch broader applications than only the refinement of algorithms; thisevaluation also provides a powerful means for preparedness and responsestrategies to health threats. Correlation of pre-event data with datarecorded during response and recovery provides evidence-based validationfor those factors which best minimize the impact of an outbreak.Analysis of data queries and requested reports during an outbreakresponse will identify data streams that need to be brought into theGDSP™ 100.

FIG. 5E illustrates the last process steps of the process 600, namelycommunicate information, block 1010, and adding to an event report,block 1020. The communicate information process 1010 allows a GDSP™user/analyst to share information in various reporting formats includingscreen shots, maps, a standard report, and any data used to create theinformation with applied algorithm(s). Collaborations between analystsand between organizations are supported by the GDSP™ characterizationphase. In addition, the GDSP™ 100 supports decision aids and requestsfor enhanced data collection to provide more analytical capability.Several of the characterization activities involve bidirectionalcapabilities. The GDSP™ 100 will also start archiving data associatedwith the potential event. The add to report process 1020 provides auser-driven control to add the GDSP™-presented information to areporting template in preparation for sending the report.

FIG. 6 illustrates various phases during a public health event that areincluded within the GDSP™ 100 method and implementation. These phasesinclude:

Monitor to Detect an Outbreak (Early Event (Outbreak) Detection)

-   -   Monitoring to detect an event monitors the current health of a        jurisdiction in order to find or identify event of concern to        the Public Health. This phase includes all the features and        functions needed to collect data from source systems, including        organizations and people, consolidate the collected information        into a coherent picture, and present the information so that a        knowledgeable person, generally an epidemiologist, can interpret        the presented information in order to detect an event.

Monitor Progress of an Outbreak

-   -   This phase tracks the progress of an event by monitoring both        the effect of any event investigations and responses as well as        continuing to monitor the current situation. Since the Monitor        to Detect an Event also monitors the current situation, this        phase includes the features and functions of the Monitor to        Detect an Outbreak phase. In order to monitor the progress of an        event, the Detect an Event phase will involve the communication        of all known or suspected event information to the Monitor the        Progress of an Event business phase.

Monitor Outbreak Preparedness

-   -   This phase involves continual monitoring by an agency,        jurisdiction, or organization in order to respond to an event.        This phase involves monitoring emergency response planning,        training, and overall response capacity.

Respond to and Manage Response to an Outbreak

-   -   Event response involves many teams across many disciplines and        with many purposes.

In order to be effective, event response must be managed in a clear,effective manner. This phase provides for initiating an event responseand managing the response including cross-jurisdictional responses.

Table 2 summarizes these phases.

TABLE 2 GDSP ™ Public Health Event Phase Summary Monitor to MonitorManage Monitor Detect an Progress of Outbreak Outbreak GDSP ™ ProcessesOutbreak an Outbreak Response Preparedness Capture Information X X X XTransform Incoming X X X X Information Analyze Input X X X X InformationMaintain Reference X Information Create/Validate/ X X ArchiveAlgorithm(s) Set up Analysis X X Cycle Investigate Potential X X PHThreat Event & Detect Potential PH Threat Event Provide PH Threat XEvent Alert Manage Response to X PH Threat Event Archive PH Threat XEvent Information Communicate X X X X Information Add to Report X X X X

As can be seen from Table 2, certain of the GDSP™ functions illustratedin FIG. 4, and described in FIGS. 5A-5E, are executed by the GDSP™ 100during each of the public health event phases, while others relate toless than all the phases.

FIG. 7 is a sample alert feed used with the GDSP™ 100. As shown in FIG.7, the alert is a formatted XML message that identifies the location ofthe potential public health event, and specific information relating tothe number of victims. Other alerts may be formatted in differingfashions, and may contain additional information regarding the event.

FIGS. 8-20 are user interfaces that illustrate features and functions ofthe GDSP™ 100 of FIG. 2. FIG. 8 show the overall usage on the componentsby the browser (295). The browser 295 interacts with the AJAX components(258) and the overall processing components 250 to communicate requeststo Google Map (243) and Yahoo services (291, 293, 295). In addition thebrowser 295 can call the processing components 250 directly.

FIGS. 9-20 show the visual representation of what a GDSP™ user will beable to perform once the GDSP™ 100 is displayed on the browser 295, forexample. For example, a Google Map 245 is displayed on the browser 295.The user can interact with the map 245 through the use the AJAX utility258 thereby adding markers such as traffic (293) and places (295). Highlevel map technology integration can be viewed in FIGS. 8-14 while layer

1: A global disease surveillance platform, comprising: a platformprocessor, wherein potential public health events are identified,determined, analyzed, and wherein responses to the public health eventsare monitored; an interface coupled to the platform processor, whereinthe interface receives external information feeds comprising structuredand unstructured data, and wherein meta-data are extracted from thestructured and unstructured data, indexed, and related back to thestructured and unstructured data; an external services module thatprovides services to facilitate the responses; and a storage device,wherein meta-data from the structured and unstructured data, and thestructured and unstructured data are stored. 2: The platform of claim 1,wherein the external services comprise geo-spatial services. 3: Theplatform of claim 1, further comprising: a data transformation modulethat transforms data from the structured and unstructured data sourcesinto a schema consistent with a schema of the platform; a processingcomponent coupled to the interface, comprising: analysis algorithms, theanalysis algorithms applied to the meta-data, an alert module, whereinwhen a threshold, as indicated by application of the algorithms themeta-data is exceeded, a public health alert is sounded, a data fusionmodule that identifies, evaluates, tags, and correlates the structuredand the unstructured data, and the corresponding meta-data to produce adata file related to a specific public health event, and access modulesthat operate to allow real-time access to the data file, wherein aresponse to the public health event is managed from pre-planning,detection, and response. 4: An apparatus for managing phases of a publichealth event, the apparatus including one or more suitably programmedcomputing devices, the apparatus comprising: an interface that receivesstructured and unstructured data from one or more external data sources,the interface, comprising: a data transformation module that transformsdata from the structured and unstructured data sources into a schemaconsistent with a schema of the apparatus, and a data classificationmodule that that extracts meta-data related to the structured andunstructured data and creates an index of the meta-data back to the metadata's structured or unstructured data; a data store coupled to theinterface, wherein the indexed meta-data and the structured andunstructured data are stored; a processing component coupled to theinterface, comprising: analysis algorithms, the analysis algorithmsapplied to the meta-data, an alert module, wherein when a threshold, asindicated by application of the algorithms to the meta-data is exceeded,a public health alert is sounded, and access modules that operate toallow real-time access to the structured and unstructured data, and tothe corresponding meta-data, wherein a response to the public healthevent is managed from pre-planning, identification, detection, andresponse. 5: The apparatus of claim 4, further comprising externalservices coupled to the processing component, wherein the processingcomponent receives geo-spatial information and wherein the processingcomponent operates to populate one or more geo-spatial products with theextracted meta-data, wherein the populated geo-spatial products providevisual and temporal displays of progress of the public health event,including indications of populations at risk from the public healthevent, and wherein the geo-spatial products allow monitoring ofeffectiveness of response actions. 6: The apparatus of claim 5, whereinthe populated geo-spatial products further provide visual indication oflocations of public health response assets. 7: The apparatus of claim 4,further comprising notification modules that operate to provide thealert to one or more users of the apparatus. 8: The apparatus of claim7, wherein the notification module incorporates a triage algorithm fornotification of the users. 9: The apparatus of claim 4, wherein theindex provides retrieval of structured and unstructured data,corresponding to the indexed meta-data, from the data store, for furtheranalysis. 10: The apparatus of claim 4, wherein the algorithms comprisemeans for back tracking from a current status of a public health eventto locate a source and time of first occurrence of the event. 11: Theapparatus of claim 4, further comprising a virtual meeting room, whereinanalysis of the public health event-related data are displayed andinformation related to the public health event are received from usersof the apparatus. 12: The apparatus of claim 4, further comprising meansfor modifying existing analysis algorithms and creating new analysisalgorithms. 13: A method for managing a response to a public healthevent during an entire life cycle of the event, the method executed onone or more computing devices, the method comprising: receivinginformation contained in one or more structured and unstructured datasources; initially processing the information, comprising: extractingmeta-data from the data sources, wherein the meta-data are linked totheir corresponding data source, transforming the extracted meta-data,classifying the transformed meta-data, and storing the indexed meta-dataand their corresponding data source, wherein the index allows retrievalof the corresponding data source; and analyzing the meta-data todetermine if a threshold value indicative of a public health event hasbeen exceeded, wherein if the threshold has been exceeded, providing aninitial public health event alert, and continuing to collect, process,and analyze information to allow management of the response. 14: Themethod of claim 13, further comprising: applying one or more detectionalgorithms to the meta-data to determine the extent of the public healthevent and to prepare and subsequently manage the response to the publichealth event; and archiving information related to the public healthevent. 15: The method of claim 13, wherein the pubic health event is oneof acute, mild, and chronic conditions, wherein the public health eventaffects one or more of humans, animals, and the environment, and whereinthe public health event is caused by one or more of natural,technological, man-made, and bio-terrorism mechanism.